NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Automated Validating and Fixing of Text-to-SQL Translation with Execution Consistency

https://doi.org/10.1145/3725271

Yang, Yicun; Wang, Zhaoguo; Xia, Yu; Wei, Zhuoran; Ding, Haoran; Piskac, Ruzica; Chen, Haibo; Li, Jinyang (June 2025, Proceedings of the ACM on Management of Data)

State-of-the-art Text-to-SQL models rely on fine-tuning or few-shot prompting to help LLMs learn from training datasets containing mappings from natural language (NL) queries to SQL statements. Consequently, the quality of the dataset can greatly affect the accuracy of these Text-to-SQL models. Unlike other NL tasks, Text-to-SQL datasets are prone to errors despite extensive manual efforts due to the subtle semantics of SQL. Our study has found a non-negligible (>30%) portion of incorrect NL to SQL mapping cases exists in popular datasets Spider and BIRD. This paper aims to improve the quality of Text-to-SQL training datasets and thereby increase the accuracy of the resulting models. To do so, we propose a necessary correctness condition called execution consistency. For a given database instance, an NL to SQL mapping satisfies execution consistency if the execution result of an NL query matches that of the corresponding SQL. We develop SQLDriller to detect incorrect NL to SQL mappings based on execution consistency in a best-effort manner by crafting database instances that likely result in violations of execution consistency. It generates multiple candidate SQL predictions that differ in their syntax structures. Using a SQL equivalence checker, SQLDriller obtains counterexample database instances that can distinguish non-equivalent candidate SQLs. It then checks the execution consistency of an NL to SQL mapping under this set of counterexamples. The evaluation shows SQLDriller effectively detects and fixes incorrect mappings in the Text-to-SQL dataset, and it improves the model accuracy by up to 13.6%.
more » « less
Free, publicly-accessible full text available June 17, 2026
Proving Query Equivalence Using Linear Integer Arithmetic

https://doi.org/10.1145/3626768

Ding, Haoran; Wang, Zhaoguo; Yang, Yicun; Zhang, Dexin; Xu, Zhenglin; Chen, Haibo; Piskac, Ruzica; Li, Jinyang (December 2023, Proceedings of the ACM on Management of Data)

Proving the equivalence between SQL queries is a fundamental problem in database research. Existing solvers model queries using algebraic representations and convert such representations into first-order logic formulas so that query equivalence can be verified by solving a satisfiability problem. The main challenge lies in unbounded summations, which appear commonly in a query's algebraic representation in order to model common SQL features, such as projection and aggregate functions. Unfortunately, existing solvers handle unbounded summations in an ad-hoc manner based on heuristics or syntax comparison, which severely limits the set of queries that can be supported. This paper develops a new SQL equivalence prover called SQLSolver, which can handle unbounded summations in a principled way. Our key insight is to use the theory of LIA^*, which extends linear integer arithmetic formulas with unbounded sums and provides algorithms to translate a LIA^* formula to a LIA formula that can be decided using existing SMT solvers. We augment the basic LIA^* theory to handle several complex scenarios (such as nested unbounded summations) that arise from modeling real-world queries. We evaluate SQLSolver with 359 equivalent query pairs derived from the SQL rewrite rules in Calcite and Spark SQL. SQLSolver successfully proves 346 pairs of them, which significantly outperforms existing provers.
more » « less
Full Text Available
Bringing Decentralized Search to Decentralized Services

Li, Mingyu; Zhu, Jinhao; Zhang, Tianxu; Tan, Cheng; Xia, Yubin; Angel, Sebastian; Chen, Haibo (July 2021, Symposium on Operating Systems Design and Implementation (OSDI))

This paper addresses a key missing piece in the current ecosystem of decentralized services and blockchain apps: the lack of decentralized, verifiable, and private search. Existing decentralized systems like Steemit, OpenBazaar, and the growing number of blockchain apps provide alternatives to existing services. And yet, they continue to rely on centralized search engines and indexers to help users access the content they seek and navigate the apps. Such centralized engines are in a perfect position to censor content and violate users’ privacy, undermining some of the key tenets behind decentralization. To remedy this, we introduce DeSearch, the first decentralized search engine that guarantees the integrity and privacy of search results for decentralized services and blockchain apps. DeSearch uses trusted hardware to build a network of workers that execute a pipeline of small search engine tasks (crawl, index, aggregate, rank, query). DeSearch then introduces a witness mechanism to make sure the completed tasks can be reused across different pipelines, and to make the final search results verifiable by end users. We implement DeSearch for two existing decentralized services that handle over 80 million records and 240 GBs of data, and show that DeSearch can scale horizontally with the number of workers and can process 128 million search queries per day.
more » « less
Full Text Available
Polar Isotactic and Syndiotactic Polypropylenes by Organozirconium‐Catalyzed Masking‐Reagent‐Free Propylene and Amino–Olefin Copolymerization

https://doi.org/DOI:10.1002/anie.202005635

Huang, Minglu; Chen, Jiazhen; Wang, Binghao; Huang, Wei; Chen, Haibo; Gao, Yanshan; Marks, Tobin J. (October 2020, Angewandte Chemie International Edition)

Full Text Available

Search for: All records